Performance Characterization of a Multithreaded Architecture: Where Are the Beneets?
ثبت نشده
چکیده
Multithreaded architectures hold the promise of high performance through an overlap of computation and communication. This paper explores how the overlap in multithreaded execution aaects the performance of processor, memory, and network subsystems; what are the critical parameters to ensure high processor performance; and what is the performance impact of optimizations of the workload and architectures. We use McGill's EARTH system as a case study. We conduct empirical measurements to show the real costs of multithreading, and evaluate the performance of multithreaded architectures using analytical models. Our detailed characterization of multithreaded execution includes new program workload parameters like the number of remote (and local) accesses per thread. We use synthetic benchmarks to explore how the program workload parameters aaect the performance individually and in combinations. A small number of remote requests per thread and a high thread runlength are crucial to achieve a high processor performance. While multithreading is expected to tolerate long latencies for remote data accesses, the latencies increase up to 200% higher than their no-load values around 300 cycles, for ne-grain parallel program workload on the EARTH system. Further, a superscalar program execution reduces the computation time, thereby injecting messages more frequently and increasing the latencies. Consequently, the change in processor performance may be little. We identify the delays at the EARTH node as the performance bottlenecks, and explore two EARTH conngurations which can yield higher gains. While the current EARTH implementation can tolerate the long latencies of a NOW system, reducing the delays at an EARTH node allows higher performance at ner granularities. With a performance characterization such as ours, a compiler can choose a right mix of parameter values to achieve the desired performance. Similarly, an architect can evaluate the trade-oos of realistic design implementations.
منابع مشابه
withO - the - Shelf RISC
Multithreaded architectures have been proposed for future multiprocessor systems due to their ability to cope with network and synchronization latencies. Some of these architectures depart signiicantly from current RISC processor designs, while others retain most of the RISC core unchanged. However, in light of the very low cost and excellent performance of oo-the-shelf microprocessors it seems...
متن کاملA Prefetching Technique for Object-Oriented Databases
We present a new prefetching technique for object-oriented databases which exploits the availability of multiprocessor client workstations. The prefetching information is obtained from the object relationships on the database pages and is stored in a Prefetch Object Table. This prefetching algorithm is implemented using multithreading. In the results we show the theoretical and empirical beneet...
متن کاملSimultaneous Multithreaded DSPs: Scaling from High Performance to Low Power
In the DSP world, many media workloads have to perform a specific amount of work in a specific period of time. This observation led us to examine how we can exploit Simultaneous Multithreading for VLIW DSP architectures to: 1) increase throughput in situations where performance is the most important attribute (e.g., base station workloads) and 2) decrease power consumption in situations where p...
متن کاملBus Utilization Analysis of Multithreaded Shared-bus Multiprocessors:initial Results
A shared-bus shared-memory multiprocessor based on multithreaded CPUs is evaluated against different solutions for cache and coherence protocols. Multithreaded architectures have been intensively studied for DSM multiprocessors, where memory latencies are a major factor in limiting performance. They can be interesting also for bus-based multiprocessors, since processor speed are increasing at a...
متن کاملMeasuring the Performance of Multithreaded Processors
Nowadays, multithreaded architectures are becoming more and more popular. In fact, many processor vendors have already shipped processors with multithreaded features. Regardless of this push on multithreaded processors, still today there is not a clear procedure that defines how to measure the behavior of a multithreaded processor. This paper presents FAME, a new evaluation methodology aimed to...
متن کامل